Evaluating the Link between Word Frequencies and Pronunciation Variants: a Cross-lingual Study on Read and Spontaneous Speech
نویسندگان
چکیده
The aim of this contribution is twofold: evaluating the use of pronunciation variants in read and spontaneous speech and studying the link between word frequencies and pronunciation variants. The dependance of pronunciation variants on a given system connguration is also addressed in the rst part. For the second aspect of this work diierent variant types are deened. A cross-lingual study is carried out for both read and spontaneous speech in French and American English using the following corpora: BREF 2], MASK 3], WSJ 4], ARPA-HUB4 5]. 1 Evaluating the use of pronunciation variants Adding pronunciation variants in a recognition system's lexicon is a means of increasing acoustic modeling options for these words. The additional variants are expected to improve the recognizer's decoding accuracy provided they concern potential error regions. However, if the type of variants is inappropriate (not relevant) with respect to the recognizer's weakness or if the number of variants is too high the overall recognizer's performance may decrease. How many times were the new pronunciation variants, which were added to solve a given decoding problem, globally ineeective? While solving the problem for which they were designed, the variants may introduce new errors elsewhere, canceling the local beneet. As variants often increase the homophone rates they may be potential error sources. Furthermore, a large increase in the number of variants can decrease decoding performance in terms of computational requirements. Variants are thus introduced carefully in our speech recognition systems. In this contribution we address the use of pronunciation variants during speech/transcription alignment in diierent system conngurations (using diierent transcription lexica, and diierent acoustic model sets). The aim is to evaluate the number and type of the observed pronunciation variants as a function of a particular system connguration. For this study three (four) pronunciation lexica are used (corresponding to the same orthographic word list):-LEX1 : standard lexicon for LVSR (5-10% of variants)-LEX1' : no variants (full form pronunciations of LEX1)-LEX2 : lexicon with a large number of variants (40-60%).-LEX3 : lexicon with a very large number of variants (80-100%). Variants in LEX2 and LEX3 are derived semi-automatically from phone recognition experiments. LEX1' without variants or LEX1 (standard recognition lexicon) can be used as reference lexicon to measure the occurrence rate of additional variants from LEX2 and LEX3. Acoustic models are trained for each lexicon. For a given lexicon (LEX2, LEX3), alignment is carried out using the diierent acoustic model sets …
منابع مشابه
Pronunciation variants across system configuration, language and speaking style
This contribution aims at evaluating the use of pronunciation variants for di erent recognition system con gurations, languages and speaking styles. This study is limited to the use of variants during speech alignment, given an orthographic transcription of the utterance and a phonemically represented lexicon, and is thus focused on the modeling capabilities of the acoustic word models. To meas...
متن کاملPronunciation Variants Across Systems, Languages and Speaking Style
This contribution aims at evaluating the use of pronunciation variants across different system configurations, languages and speaking styles. This study is limited to the use of variants during speech alignment, given an orthographic transcription and a phonemically represented lexicon, thus focusing on the modeling abilities of the acoustic word models. Parallel and sequential variants are tes...
متن کاملPronunciation variant analysis using speaking style parallel corpus
To improve the recognition accuracy for spontaneous conversational speech, we collected a corpus to study how spontaneous conversational speech differs from read style speech. The corpus consists of two parts: 1) spontaneous conversational speech and 2) read speech with the same word transcriptions as the conversational speech. In word and phone recognition experiments, it was confirmed that, f...
متن کاملAdapting the acoustic model of a speech recognizer for varied proficiency non-native spontaneous speech using read speech with language-specific pronunciation difficulty
This paper presents a novel approach to acoustic model adaptation of a recognizer for non-native spontaneous speech in the context of recognizing candidates’ responses in a test of spoken English. Instead of collecting and then transcribing spontaneous speech data, a read speech corpus is created where non-native speakers of English read English sentences of different degrees of pronunciation d...
متن کاملComparison between Expert Listeners and Continuous Speech Recognizers in Selecting Pronunciation Variants
In this paper, the performance of an automatic transcription tool corpus is by modeling pronunciation variation [2]. is evaluated. The transcription tool is a continuous speech Another way of obtaining models which are less recognizer (CSR) which can be used to select pronunciation contaminated is to train PMs on read speech. It is well known variants (i.e. detect insertions and deletions of ph...
متن کامل